Programming with Annotated Grammar Estimation
نویسنده
چکیده
Evolutionary algorithms (EAs) mimic natural evolution to solve optimization problems. Because EAs do not require detailed assumptions, they can be applied to many real-world problems. In EAs, solution candidates are evolved using genetic operators such as crossover and mutation which are analogs to natural evolution. In recent years, EAs have been considered from the viewpoint of distribution estimation, with estimation of distribution algorithms (EDAs) attracting much attention ([14]). Although genetic operators in EAs are inspired by natural evolution, EAs can also be considered as algorithms that sample solution candidates from distributions of promising solutions. Since these distributions are generally unknown, approximation schemes are applied to perform the sampling. Genetic algorithms (GAs) and genetic programmings (GPs) approximate the sampling by randomly changing the promising solutions via genetic operators (mutation and crossover). In contrast, EDAs assume that the distributions of promising solutions can be expressed by parametric models, and they perform model learning and sampling from the learnt models repeatedly. Although GA-type sampling (mutation or crossover) is easy to perform, it has the disadvantage that GA-type sampling is valid only for the case where two structurally similar individuals have similar fitness values (e.g. the one-max problem). GA and GP have shown poor search performance in deceptive problems ([6]) where the condition above is not satisfied. However, EDAs have been reported to show much better search performance for some problems that GA and GP do not handle well. As in GAs, EDAs usually employ fixed length linear arrays to represent solution candidates (these EDAs are referred to as GA-EDAs in the present chapter). This decade, EDAs have been extended so as to handle programs and functions having tree structures (we refer to these as GP-EDAs in the present chapter). Since tree structures have different node number, the model learning is muchmore difficult than that of GA-EDAs. From the viewpoint of modeling types, GP-EDAs can be broadly classified into two groups: probabilistic proto-type tree (PPT) based methods and probabilistic context-free grammar (PCFG) based methods. PPT-based methods employ techniques devised in GA-EDAs by transforming variable length tree structures into fixed length linear arrays. PCFG-based methods employ
منابع مشابه
Probabilistic Unification Grammars
Recent research has shown that unification grammars can be adapted to incorporate statistical information, thus preserving the processing benefits of stochastic context-free grammars while offering an efficient mechanism for handling dependencies. While complexity studies show that a probabilistic unification grammar achieves an appropriately lower entropy estimate than an equivalent PCFG, the ...
متن کاملProgram distribution estimation with grammar models
This research extends conventional Estimation of Distribution Algorithms (EDA) to Genetic Programming (GP) domain. We propose a framework to estimate the distribution of solutions in tree form. The core of this framework is a grammar model. In this research, we show, both theoretically and experimentally, that a grammar model has many of the properties we need for estimation of distribution for...
متن کاملData-Driven Compilation of LFG Semantic Forms
In a recent paper (van Genabith et al., 1999) describe a semi-automatic method for annotating tree banks with high level Lexical Functional Grammar (LFG) f-structure representations. First, a CF-PSG is automatically induced from the tree bank using the method described in (Charniak, 1996). The CF-PSG is then manually annotated with functional schemata. The resulting LFG is then used to determin...
متن کاملExtracting a bilingual semantic grammar from FrameNet-annotated corpora
We present the creation of an English-Swedish FrameNet-based grammar in Grammatical Framework. The aim of this research is to make existing framenets computationally accessible for multilingual natural language applications via a common semantic grammar API, and to facilitate the porting of such grammar to other languages. In this paper, we describe the abstract syntax of the semantic grammar w...
متن کاملCache-based Dynamic PCFG Adaptation using MAP Estimation
This paper presents a cache-based dynamic adaptation technique for lexicalized probabilistic context-free-grammar (LPCFG). Expected counts from machine-parsed sentences of in-domain data are stored in a cache, which are combined with prior counts from hand-annotated parses of outof-domain data using maximum a posteriori (MAP) estimation. This adaptation is unsupervised, and dynamic with an adap...
متن کاملA Statistical Dependency Parser of Chinese under Small Training Data
Parsing is an important and difficult problem in natural language processing. In this paper a probabilistic parsing model is proposed in terms of dependency grammar. A small corpus annotated manually serves as training data because the largescale Chinese Treebank is unavailable at present. Dependency relations of the partof-speech tags are obtained from training data, and their probabilities ar...
متن کامل